Domain adaptation and sample bias correction theory and algorithm for regression

نویسندگان

  • Corinna Cortes
  • Mehryar Mohri
چکیده

We present a series of new theoretical, algorithmic, and empirical results for domain adaptation and sample bias correction in regression. We prove that the discrepancy is a distance for the squared loss when the hypothesis set is the reproducing kernel Hilbert space induced by a universal kernel such as the Gaussian kernel. We give new pointwise loss guarantees based on the discrepancy of the empirical source and target distributions for the general class of kernel-based regularization algorithms. These bounds have a simpler form than previous results and hold for a broader class of convex loss functions not necessarily differentiable, including Lq losses and the hinge loss. We also give finer bounds based on the discrepancy and a weighted feature discrepancy parameter. We extend the discrepancy minimization adaptation algorithm to the more significant case where kernels are used and show that the problem can be cast as an SDP similar to the one in the feature space. We also show that techniques from smooth optimization can be used to derive an efficient algorithm for solving such SDPs even for very high-dimensional feature spaces and large samples. We have implemented this algorithm and report the results of experiments both with artificial and real-world data sets demonstrating its benefits both for general scenario of adaptation and the more specific scenario of sample bias correction. Our results show that it can scale to large data sets of tens of thousands or more points and demonstrate its performance improvement benefits.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Accuracy of DGPS Correction Prediction in Position Domain using Radial Basis Function Neural Network Trained by PSO Algorithm

Differential Global Positioning System (DGPS) provides differential corrections for a GPS receiver in order to improve the navigation solution accuracy. DGPS position signals are accurate, but very slow updates. Improving DGPS corrections prediction accuracy has received considerable attention in past decades. In this research work, the Neural Network (NN) based on the Gaussian Radial Basis Fun...

متن کامل

Sample-oriented Domain Adaptation for Image Classification

Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...

متن کامل

The Second-order Bias and MSE of Quantile Estimators

The finite sample theory using higher order asymptotics provides better approximations of the bias and mean squared error (MSE) for a class of estimators. However, no finite sample theory result is available for the quantile regression and the literature on the quantile regression has been entirely on the first-order asymptotic theory. This paper develops new analytical results on the second-or...

متن کامل

Sample Selection Bias Correction Theory

This paper presents a theoretical analysis of sample selection bias correction. The sample bias correction technique commonly used in machine learning consists of reweighting the cost of an error on each training point of a biased sample to more closely reflect the unbiased distribution. This relies on weights derived by various estimation techniques based on finite samples. We analyze the effe...

متن کامل

A comparative study of quantitative mapping methods for bias correction of ERA5 reanalysis precipitation data

This study evaluates the ability of different quantitative mapping (QM) methods as a bias correction technique for ERA5 reanalysis precipitation data. Climate type and geographical location can affect the performance of the bias correction method due to differences in precipitation characteristics. For this purpose, ERA5 reanalysis precipitation data for the years 1989-2019 for 10 selected syno...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 519  شماره 

صفحات  -

تاریخ انتشار 2014